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Abstract —The ith coordinate of an (n, k) code is said to have 
locality r and availability t if there exist t disjoint groups, each 
containing at most r other coordinates that can together recover 
the value of the ith coordinate. This property is particularly 
useful for codes for distributed storage systems because it permits 
local repair and parallel accesses of hot data. In this paper, 
for any positive integers r and t, vre construct a binary linear 
code of length vrhich has locality r and availability t for 

all coordinates. The information rate of this code attains 
which is always higher than that of the direct product code, the 
only known construction that can achieve arbitrary locality and 
availability. 

I. Introduction 

Nowadays various forms of data redundancy are used in 
coding for distributed storage systems to insure data integrity 
and to improve performance efficiency. Among those the code 
with locality r has become an attractive subject since it was 
proposed independently by Gopalan et al. Q, Oggier et al. 

and Papailiopoulos et al. mi- More precisely, the *th 
coordinate of a code is said to have locality r if the value at 
this coordinate can be recovered by accessing at most r other 
coordinates. In other words, associating each coordinate with 
a storage node in a distributed storage system, an (n, k) code 
with repair locality r {r k) can greatly reduces the disk 
I/O complexity for node repair. Considering data reliability 
and storage efficiency, a lot of work studied upper bounds 
on the minimum distance and information rate of such codes 
0 - 0 , fig, ig, 1^. Codes attaining these upper bounds 
were constructed in Hg, HD, and some have even found 
their way into practice 0,116). By now, a tight upper bound 
has been proven for the information rate, while a complete 
description for the tight upper bound on the minimum distance 
remains open. 

Things become more complicated when people start con¬ 
sidering locality r in the case of multiple node erasures. First, 
locality r that can tolerate up to (5 — 1 erasures is realized 
in 0 , 1 T 3 , fig by using inner-error-correcting codes of 
length at most r + S — 1 and minimum distance at least 6. 
Then another way is developed in 0, | |22) by providing 5—1 
disjoint local repair groups. A general framework for these 
works is built in | |23) . Besides specific code structures, the 
methods of sequential local repair HD and cooperative local 
repair m are also used for multiple erasures. 

Recently, the property of t-availability which is related 
to the structure of 5 — 1 disjoint local repair groups (e.g. 


/ = 5 — 1 in | [22) ) is investigated further in | |T4) , This 
property is particularly useful because it permits access of a 
coordinate from multiple ways in parallel, which is appealing 
in distributed storage systems with hot data. 

Specifically, the ith coordinate of a code is said to have 
locality r and availability t if there exist t disjoint groups, 
each containing at most r other coordinates that can together 
recover the value of the ith coordinate. Codes that have locality 
r and availability t for information coordinates (e.g., all 
systematic coordinates in a systematic linear code) are studied 
in ig, ig. More precisely, the authors in p2| give an 
upper bound on the minimum distance for any [n, k, d]q linear 
codes that have locality r and availability t for information 
coordinates, i.e.. 


d<n — k + 2 — 


t{k — 1) -I- 1 
'/(r-l) + l' 


They further prove the existence of codes attaining this upper 
bound when n > k{rt + l). In p^ , an upper bound is derived 
for a special class of codes of which any repair group contains 
only 1 parity symbol, and some explicit constructions attaining 
this bound are given there. 

In this paper we focus on codes that have locality r and 
availability t for all coordinates, for which the only known 
bounds are due to Tamo et al. p9), i.e. 
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and 




i=0 


These bounds are proven for all (linear or nonlinear) (n, k) 
codes by using graph methods. There remains much work 
to be done in this field, such as discussing tightness of the 
upper bounds, constructing explicit codes which are optimal 
with respect to some upper bound, and etc. By far, the only 
known construction of codes achieving any given locality r 
and availability t is the direct product code (see, e.g., 00’ 
p2|), while other constructions are given for special values 
of r and t. Table [T] lists almost all previous constructions of 
codes that have locality r and availability t for all coordinates. 
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Direct produc code 

n = (r -r 1)*, fe = r*. d = 2* 

Vr, Vi 

j?]: Simplex code 

n ^ 2"^ - 1, 
k — m, d = 
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|l3j Example 1]: 

complete graph 

^=CV)A= c+^),d = 3 

Vr, i ^ 2 

121 Construction 3]: 
o^ogonal paitition 

n, k,, d — n — m -\- 1^ 

Vr, i ^ 2 

Construction 4]; 
tensor product matrix 

n ^2^ - 1, 
k — ^n, d — 4 

r — 2, i — 3 

Construction in 
this paper 

" = (T).fc = TT'). 

d — t -j- 1 

Vr, Vi 


Table 1 


A. Our Contribution 


details can be found in the proof of Theorem Then we also 
need some basic concepts of block design. 

Definition 2. Let X be a v-set (i.e. a set with v elements), 
whose elements are called points. A t-{v, k, A) design is a 
collection of distinct k-subsets (called blocks) of X with the 
property that any t-subset of X is contained in exactly A 
blocks. 

Given a t-{v,k,X) design with v points Pi,...,Py and b 
blocks Bi,...,Bb, its b X V incidence matrix A = (a^) is 
dehned by 

1 , if Pj e B^ 

0 , if Pj ^ Bi 


For any positive integers r and t, we construct a linear 
code of length which has locality r and availability t 

for all coordinates. Besides arbitrary locality and availability 
for all coordinates, the following aspects make the code more 
desirable. 

(1) The code is over the binary field, which means efficient 
implementation in practice. 

(2) Its information rate attains which is always higher 
than that of the direct product code. Although no specific 
bound on the information rate is newly built in this work, 
through detailed comparisons with previous constructions 
and some related bounds, we believe our code has near 
optimal information rate when t is not too large (say, 
t < r). 

B. Organization 

Section II introduces formal definitions of locality and 
availability, as well as some basic concepts about block design. 
Section III presents the code construction and reveals its rela¬ 
tion with the block design. Section IV gives comparisons with 
previous constructions and information rate bounds. Section V 
concludes the paper. 

II. Locality and Availability 
Let C he an [n, k,d]q linear code with generator matrix G = 
{gi,..., Pn), where gt is a ^-dimensional column vector over 
Fq for 1 < i < n. Denote [m] = {1,2,-•• ,m} for any 
positive integer m. Then the locality r and availability t for 
linear codes are formally defined below. 

Definition 1. The ith coordinate, 1 < i < n, of an [n,k,d]q 
linear code C is said to have locality r and availability t if 
there exist t disjoint subsets R^i \ ... C [n] \ {i} such 

that for 1 < j < t, 

(1) < r, and 

(2) Pi is an ¥„-linear combination of {gi}, „^i). 

In this paper, we prove locality r and availability t by 
verifying some equivalent conditions on the dual code. The 

tin (2lj , the specific values of n, k and m depend on the corresponding 
orthogonal partition and the encoding map, and the constraints are too 
complicated to be stated here. 


where 1 < i < b and 1 <j<v. 


III. Code Construction 

The code is constructed by dehning its parity check matrix. 
For any positive integers r and t, let m = r + t. In the 
following, we define a matrix over F 2 , denoted as 
containing rows and (™) columns. Each row of H(m,t) 
is associated with a (t — l)-subset of [m] and each column 
of H{m,t) is associated with a f-subset of [m]. Given the 

elements in [m] ordered as 1 2 m, we arrange 

the rows (also columns) of in the lexical order of 

the associated subsets of [m]. More precisely, for any two 
subsets E,F C [m] with the elements in each subset sorted 
in the order then E is before F if and only if for the hrst 
elements where E and F differ, say ain E and b in F, it holds 

a 5. In this order, for 1 < i and 1 < j < (™), 

suppose the jth row is associated with the subset Ei and the 
jth column is associated with the subset Fj, then the (f, j)th 
element hij of H{m,f) is defined as follows: 


if Ei C Fj 
if E, ^ Fj 

Let us give an example of H{m, t). Suppose f = 3 and m = 
5. The matrix H{5, 3) is given in Fig. Actually, the matrix 
H{m, t) is the parity check matrix of the code (denoted as C) 
we construct in this section. Next we prove some properties 
of H{m,t) to help understand the code C. 

Lemma 3. For m > t > 1, the matrix H{m, t) is of the block 
form 



H{m, t) = 



( 2 ) 


where is the unit matrix of size and 0 is a 

zero matrix. Particularly, for m = t, H{m,t) = H{m, l)"^ = 
(1,..., 1)^ which is an all-one column vector of dimension m. 


Proof: It is obvious that H{m,l) = 

H{m, m) = (1,..., 1)’’. For m > t > 1, according to the order 
in which the rows (and columns) of H{m, f) are arranged, the 













1 1 1 1 1 1 2 2 2 3 

2223343344 



Fig. 1. The matrix H{5,3) 


upper left block (i.e. the former (™_ 2 ^) rows and the former 
{"til) columns) corresponds to the subsets of [m] containing 
1. Actually this block can be regarded as one dehned over the 
set {2,3,..., m} with columns corresponding to (f — l)-subsets 
and rows corresponding to {t — 2)-subsets. Thus this block is 
the matrix — The upper right block of H (to, t) 

is obviously 0. Each row of the bottom left block corresponds 
a (f — l)-subset of [to] \ { 1 } which is uniquely contained in 
a f-subset of [to] containing 1. Moreover, because the subsets 
are sorted in the lexical order, the bottom left block is the unit 
matrix of size Similar to the upper left block, it can 

see the bottom right block is H{m— 1, t). ■ 

When t > r +1, the matrix H{m = r + f, t) has more rows 
than columns. Actually, the following lemma states that the 
rows of H{m, t) are linearly dependent, so some rows can be 
deleted when regarded as a parity check matrix. 

Lemma 4. For the block decomposition of H(rn,t) as shown 
in (^, each row in the upper block (i.e. {H{m— l,f — 1) 0)) 
is a ¥2-linear combination of rows in the bottom block 
H{m—l,f)). Consequently, rank H{m,t) = {^Zi)- 

Proof: For any row (denoted as h) in the upper block, 
suppose it is associated with a (f — l)-subset { 1 , oi,..., 04 - 2 } 
where € [to] \ {1} for 1 < i < t — 2. Denote [to] = 
{1, Oi,..., Ot_ 2 } U { 61 ,..., bm-t+i}- Then the row h has I’s in 
the columns associated with the subsets { 1 , oi,..., at_ 2 , 6 ^}, 
1 < j < TO — f + 1. As a result, the left part of h is a sum 
of the left parts of the rows in the bottom block associated 
with the subsets joi,..., 04 - 2 , 6 j}, l<j<TO — t + 1. For 
simplicity, the collection of these rows is denoted by R. We 
only need to show the sum (in F 2 ) of the right parts of rows 
in R is 0. 

Let us focus on the right part, it can see only the columns 
associated with the subsets {ai, ..., at-2,bi,bj}, 1 < i,j < 
m — t + 1, have I’s in the rows in R. Furthermore, for 
each of these columns, say the column associated with 
{ai, ...,at-2,bi,bj}, there are exactly two rows in R, i.e. the 
row associated with joi,..., 04 - 2 , 64 } and the row associated 


with (oi,..., 04 - 2 , &j}, which have 1 in that column. Conse¬ 
quently, the sum (in F 2 ) of right parts of rows in R is 0. Since 
the rows in the bottom block of H (to, t) are obviously linearly 
independent, it follows rank = {^Zi)- ■ 

It is easy to verify that rank H (to, to) = rank H (to, 1) = 1 
which coincides with the results in Lemma Then the parity 
check matrix of the code C can be taken as 

iJ=(/(_l) if(TO-l,f)) . (3) 

Therefore C is of length (™) = and information rate 

1 - (T-/)/(T) = In the following, we continue 

to investigate the locality and availability of the code C. 

Theorem 5. The code C which has the parity check matrix 
H{r + t,t) (or H as defined in (|^) has locality r and 
availability t for all coordinates. 

Proof: It is equivalent to prove that for each coordinate 
i £ [(’^^*)] there exist t codewords in the dual code, say 
Cl,..., C 4 , such that [supp c^ j = r+1 and supp Cj n supp c; = 
{i} for 1 < j I i < t. Actually, we will see the rows of 
H{r + t, r) are exactly these codewords. 

First, for any row in H(r -f t,r), suppose it is associated 
with a (t — l)-subset E C [r + f]. Since E is contained in 
r + t — (t — 1) = r + l f-subsets of [r + f], this row has r + 1 
I’s. Namely, the support of each row is of size r + 1. 

Then, for each coordinate i £ [(’^^*)] which corresponds to 
a column of H{r + t,t), suppose this column is associated 
with a f-subset F) c [r + f]. Because Fi contains t (t — 1)- 
subsets, there are t rows which have 1 in this column. We 
claim that excluding the coordinate i, supports of these t rows 
are pairwise disjoint. Otherwise, assume there are two rows, 
say the jth row (denoted as hj, associated with the subset Ej) 
and the (th row (denoted as hi, associated with the subset Ei), 
such that {i,u} C supp hj n supp hi for some u £ [T+f]\{i}. 
It implies that 

Ej Z FiCiFy^ and Ei C F, n . 

As a result, Ej U Ei C F) n But the union (resp. 
intersection) of two different (t — l)-subsets (resp. f-subsets) 
is of size at least t (resp. at most t — 1 ), which leads a 
contradiction. Therefore, the t rows are exactly the codewords 
Cl, ...,C 4 we need to complete the proof. ■ 

Finally we determine the minimum distance of the code C 
from its parity matrix H given in Q. Since each column in 
the right part of iF, i.e. H{m— 1, t), has t I’s and the left part 
of Ff is a unit matrix, there exist t + 1 columns in H which 
are linearly dependent. Thus the minimum distance of C is at 
most t + 1. On the other hand, from the availability t it can 
see any t erasures are recoverable for C, thus the minimum 
distance is at least t + 1. Therefore the minimum distance of 
C is f + 1 . 

A. Relation with the block design 

Actually, the matrix H{r + t,t) can be viewed as an inci¬ 
dence matrix of a l-((’^^*), r+1, t) design. Moreover, suppose 





the blocks are Bi,B^, where b = then 

it holds 

iBiO Bj \ < 1 for 1 < i < j < b (4) 


In other words, once we find a l-(n,r + l,f) design with 
blocks satisfying the condition it immediately derives a 
linear code of length n with locality r and availability t by 
taking its incidence matrix as the parity check matrix of the 
code. To make the resulting code has high information rate, 
the incidence matrix needs to have low rank comparing with 
its column size. However, it is difficult to find such 1-designs, 
constructing those with incidence matrices of low rank is even 
harder. Our construction of the matrix provides a 

good way to do such things. Besides, some constructions from 
geometry are also feasible. For example, the following matrix 
H gives a l-(9, 3, 2) design satisfying the property 0, i.e. 


/l 0 0 1 
0 10 0 
0 0 10 
1110 
0 0 0 1 
Vo 0 0 0 


0 0 1 
1 0 0 
0 1 0 
0 0 0 
1 1 0 
0 0 1 


0 o\ 
1 0 
0 1 
0 0 
0 0 
1 1 / 


Then it induces a binary linear code with locality r = 2 and 
availability t = 2, but its information rate is | which is less 
than ^ 7 ^. In fact, this matrix H corresponds to the direct 
product code construction. More details can be found in the 
next section. 



Fig. 3. Comparison of the information rate for r = 2, 1 < t < 16. 


IV. Comparisons with Other Constructions and 
Information Rate Bounds 

The information rate and minimum distance are two impor¬ 
tant parameters for evaluating an error-correcting code. It is 
well known that a tradeoff exists between these two parame¬ 
ters. Although the minimum distance of the code constructed 
in Section is f -f 1 which is the lowest for a locally 
repairable code with availability t, we will see it performs 
well in information rate through the following comparisons. 


A. Comparison with Other Constructions 

1) The direct product code: The direct product code, see 
e.g., |Tg, is another code that can achieve arbitrary 

locality and availability. Specifically, the direct product of t 
binary (r + l,r) single-parity-check codes induces a code 
with locality r, availability t and information rate 


The code we constructed in Section III also achieves locality 


r and availability t, but has information rate Because 
(1 + 7 )* > 1 + f for all f > 1 , it follows that i:y^Y = 
1/(1 + < 1/(1 + ^) = :Yci- Thus, when t > 1, our code 

always has higher information rate than the direct product code 
with the same locality r and availability t. Fig. and Fig. 
display the t-- curves of the codes. 


2) Prakash et al’s construction: Recently, Prakash et al. 
0 presented a construction of locally 2-reconstructible codes 
by using Turan graphs. It was also shown in 0 that the 
resulting codes have 2-availability when using complete graphs 
instead of using Turan graphs. Particularly, a complete graph 
with V vertices induces a binary code with locality r = v — 
1 and availability t = 2. This code has length 
and information rate In fact, after a permutation of the 
columns of the parity check matrix they constructed there, 
one can get exactly the matrix H{r + 2,2). That is, the code 
is equivalent to our construction for the very special case of 
t = 2, while our work develops a general construction for 
arbitrary t. 

B. Comparison with Information Rate Bounds 

The only known bound on the information rate of locally 
repairable codes with availability t for all coordinates is due 
to Tamo et al. 0 (see the bound ([T]i stated in Section |I|. 
There does exist a gap between this upper bound and the rate 
we have achieved (i.e. But the following remarks help 

neutralize this difference. 

(1) To our knowledge, no codes attaining the bound Q have 
been given (except for the special case of f = 1). It was 
pointed out in 0 that information rate of the direct 












product code is close to the bound ([T]) for t = 2 . As we 
have seen, our code outperforms the direct product code 
in the information rate. Thus we get even closer to the 
upper bound Q- 

Actually, in | [T 3 ] Prakash et al. derived an upper bound 
for the linear locally 2-reconstructible code, i.e.. 


k r 

- < -• 

n r + 2 


( 5 ) 


Since the locally repairable code with availability 2 is a 
special class of locally 2 -reconstructible codes, the bound 
Q also applies to the code we considered in this paper 
for the case t = 2. On the other hand, our construction 
(also the construction given in p?) ) has proved tightness 
of the upper bound As a comparison, when t = 2 
the upper bound (j^ exceeds by O(^). 

(2) Our code is a binary one, while the bound Q is proved 
for all codes with locality r and availability t. It is un¬ 
surprised that the field size compromises the information 
rate of a code sometimes. 

(3) An upper bound on the information rate of codes with 

(r,(5) locality is given in | [I^, i .e. | The 

locality (r, S) as introduced in |I^ maintains the locality 
r even in the case of (5 — 1 erasures. It is easy to see 
that the requirement of locality r and availability t also 
guarantees the locality r in the case of t erasures. By 
letting t = 6 — 1, our codes attains the upper bound for 
the codes with (r, J) locality. Therefore, it is reasonable 
to believe that the information rate is near to the 
optimal for the codes with locality r and availability t, 
especially for the case that t is not too large (for example, 
t < r). As displayed in Fig. we believe the curve of 
our code is closer to the optimal curve (which we have 
not obtained definitely) than the bound Q. 

(4) However, for large t, there do exist codes which have 
information rate exceeding For example, the binary 
simplex code of length n = 2 ™ — 1 has locality r = 2 and 
availability t = 2™“^ — 1. Its information rate is 2^-1 
greater than = 2^-1 _|_i for all m > 3. A comparison 
between all these codes and bounds is shown in Fig. 
Therefore, to characterize the optimal information rate for 
codes with locality and availability, we still have a long 
way to go. 


V. Conclusions 

In this paper we construct a binary linear code with arbitrary 
locality r and availability t. The code can always have higher 
information rate than the direct product code which is the only 
known construction with the same property. Besides, it attains 
the optimal information rate at f = 2. This construction reveals 
a connection with special block designs, which may help to 
get more results on the codes with locality and availability. 
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